Skip to content

[client] Fall back to getent/id for SSH user lookup in static builds#5510

Merged
lixmal merged 1 commit intomainfrom
fix/ssh-nss-user-lookup
Mar 13, 2026
Merged

[client] Fall back to getent/id for SSH user lookup in static builds#5510
lixmal merged 1 commit intomainfrom
fix/ssh-nss-user-lookup

Conversation

@lixmal
Copy link
Copy Markdown
Collaborator

@lixmal lixmal commented Mar 5, 2026

Describe your changes

When CGO is disabled (CGO_ENABLED=0), Go's os/user only reads /etc/passwd and /etc/group directly, bypassing the system's NSS stack. This means users provided by SSSD, LDAP, Active Directory, or other NSS modules can't be resolved.

This adds getent passwd / id -G fallback for user lookup, group resolution, and shell detection. These commands go through the host's NSS stack regardless of how the Go binary was built. The fallback is only compiled in non-CGO builds (or when osusergo tag is set), since libc handles NSS natively when CGO is enabled.

Issue ticket number and link

Closes #4919

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

No user-facing API or configuration changes. Behavior is transparent.

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • More reliable user/group lookups with platform- and build-aware fallbacks, improved input validation, 5s command timeouts, and explicit Windows behavior for shell lookup.
    • Enhanced shell detection with ordered fallbacks: passwd, NSS/getent, SHELL env, then default.
  • Tests

    • Extensive cross-platform tests covering user lookup, group resolution, shell detection, parsing, validation, timeouts, and edge cases.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds NSS/getent-based user/group lookup and shell detection with platform-specific shims (CGO, non-CGO, Windows), wires them into SSH server user-switching and shell resolution, and adds comprehensive unit tests for parsing, validation, and fallback behavior.

Changes

Cohort / File(s) Summary
Core getent utilities & tests
client/ssh/server/getent_unix.go, client/ssh/server/getent_unix_test.go
Implements runGetent, parseGetentPasswd, runIdGroups, input validation and shell extraction with timeouts; adds extensive Unix-focused tests covering parsing, validation, execution paths, and edge cases.
Platform-specific shims
client/ssh/server/getent_cgo_unix.go, client/ssh/server/getent_nocgo_unix.go, client/ssh/server/getent_windows.go
Adds unexported helpers lookupWithGetent, currentUserWithGetent, groupIdsWithFallback with CGO delegation, non-CGO getent/id fallbacks (with debug logs), and Windows stubs (empty shell).
Integration wiring
client/ssh/server/user_utils.go, client/ssh/server/userswitching_unix.go, client/ssh/server/shell.go
Rewires DI to use new helpers; changes getSupplementaryGroups to accept *user.User and use groupIdsWithFallback; updates getUnixUserShell to prefer passwd → getent → env → default.
Cross-platform tests
client/ssh/server/getent_test.go
Adds cross-platform tests for lookup/current user, group IDs, shell retrieval, and failure conditions (including Windows expectations).

Sequence Diagram(s)

sequenceDiagram
    participant SSH as "SSH Server"
    participant StdLib as "os/user (stdlib)"
    participant Getent as "getent / id (system)"
    participant Logger as "Logger"

    rect rgba(100,150,200,0.5)
    Note over SSH,StdLib: User lookup flow
    SSH->>StdLib: user.Lookup(username)
    alt StdLib returns user
        StdLib-->>SSH: *user.User
    else StdLib returns error
        StdLib-->>SSH: error
        SSH->>Logger: debug fallback to getent
        SSH->>Getent: exec "getent passwd <username>"
        Getent-->>SSH: passwd entry / error
        SSH->>SSH: parseGetentPasswd()
        SSH-->>SSH: *user.User or original error
    end
    end

    rect rgba(150,200,100,0.5)
    Note over SSH,Getent: Group IDs flow
    SSH->>StdLib: u.GroupIds()
    alt StdLib returns ids
        StdLib-->>SSH: []string
    else StdLib returns error
        StdLib-->>SSH: error
        SSH->>Logger: debug fallback to id
        SSH->>Getent: exec "id -G <username>"
        Getent-->>SSH: "gid1 gid2" / error
        SSH->>SSH: parse -> []string or original error
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • pappz

Poem

🐰 I hopped through passwd lines and fields so deep,
I sniffed out shells where hidden entries sleep.
With getent, id, and timeouts set just right,
Remote users hop back into sight.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding getent/id fallback for SSH user lookup in static builds (non-CGO environments).
Description check ✅ Passed The description comprehensively covers the problem, solution, rationale, and includes issue reference, checklist completion, and documentation decision.
Linked Issues check ✅ Passed The changes fully address issue #4919 by implementing getent/id fallback for NSS-based user resolution in non-CGO builds, enabling SSH access for LDAP/SSSD users.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to the objective: adding NSS fallback mechanisms for user lookup, group resolution, and shell detection without modifying unrelated functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/ssh-nss-user-lookup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lixmal lixmal force-pushed the fix/ssh-nss-user-lookup branch from bc94487 to 40542b7 Compare March 5, 2026 10:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
client/ssh/server/getent_test.go (1)

67-71: Minor: Redundant assertion after empty check.

Line 70's assertion len(shell) > 0 is redundant since line 67's condition already ensures shell != "" at this point.

♻️ Suggested simplification
 	shell := getShellFromGetent(current.Uid)
 	if shell == "" {
 		t.Log("getShellFromGetent returned empty, getent may not be available")
-	} else {
-		assert.True(t, len(shell) > 0, "shell should be non-empty when getent is available")
 	}
+	// If we reach here with non-empty shell, the test passed
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/ssh/server/getent_test.go` around lines 67 - 71, The test contains a
redundant assertion after checking for an empty shell: in getent_test.go the
if/else around getShellFromGetent() logs when shell == "" and then asserts
len(shell) > 0 in the else branch, which is unnecessary; simplify by removing
the redundant assert in the else branch (or replace the else with a single
assert that fails with a clear message when shell == ""), keeping reference to
getShellFromGetent and the test's logging/assertion logic.
client/ssh/server/getent_unix.go (1)

103-106: Returning nil, nil for empty output may cause confusion.

When id -G returns empty output, this returns (nil, nil). Callers may interpret this as "no groups" vs "error occurred". Consider returning an empty slice []string{} instead of nil to make the "no groups" case explicit.

♻️ Suggested improvement
 	trimmed := strings.TrimSpace(string(out))
 	if trimmed == "" {
-		return nil, nil
+		return []string{}, nil
 	}
 	return strings.Fields(trimmed), nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/ssh/server/getent_unix.go` around lines 103 - 106, The function that
handles the output of `id -G` currently returns (nil, nil) when `trimmed == ""`,
which is ambiguous; update the branch that checks `trimmed :=
strings.TrimSpace(string(out))` so it returns an explicit empty slice (e.g.
`[]string{}`) and nil error instead of `nil, nil`—locate the function that
invokes `id -G`/parses groups (the block using `trimmed` in getent_unix.go) and
change the return to `return []string{}, nil` to clearly signal “no groups.”
client/ssh/server/getent_unix_test.go (1)

228-235: TestRunGetent_NotAvailable is non-deterministic and likely always skipped in CI.

Consider moving command lookup/execution behind a small injectable function so this path can be unit-tested deterministically instead of depending on host tooling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/ssh/server/getent_unix_test.go` around lines 228 - 235,
TestRunGetent_NotAvailable relies on the host having/ not having the "getent"
binary and is non-deterministic; refactor runGetent to accept or call an
injectable lookup/exec hook (e.g., replace direct exec.LookPath/exec.Command
usage with a package-level variable/function like lookupPath(name string)
(string, error) and commandRunner(name string, args ...string) *Cmd) so tests
can override them. Change runGetent to use these hooks (defaulting to
exec.LookPath and exec.Command) and update TestRunGetent_NotAvailable to
override lookupPath to return an error, asserting runGetent("root") fails
deterministically. Ensure the hooks are only used in runGetent and exposed for
tests without changing public API semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/ssh/server/getent_unix_test.go`:
- Around line 237-253: Tests calling runIdGroups (TestRunIdGroups_CurrentUser,
TestRunIdGroups_NonexistentUser, and TestIdGroupsMatchStdlib) assume the
external `id` binary exists; locate these test functions and add the same guard
used for getent checks: call exec.LookPath("id") at the start of each test and
t.Skipf("skipping test: %v", err) when LookPath returns an error so tests are
skipped on minimal Unix images without `id`. Ensure you reference/run the guard
before invoking runIdGroups to avoid infrastructure-dependent failures.

---

Nitpick comments:
In `@client/ssh/server/getent_test.go`:
- Around line 67-71: The test contains a redundant assertion after checking for
an empty shell: in getent_test.go the if/else around getShellFromGetent() logs
when shell == "" and then asserts len(shell) > 0 in the else branch, which is
unnecessary; simplify by removing the redundant assert in the else branch (or
replace the else with a single assert that fails with a clear message when shell
== ""), keeping reference to getShellFromGetent and the test's logging/assertion
logic.

In `@client/ssh/server/getent_unix_test.go`:
- Around line 228-235: TestRunGetent_NotAvailable relies on the host having/ not
having the "getent" binary and is non-deterministic; refactor runGetent to
accept or call an injectable lookup/exec hook (e.g., replace direct
exec.LookPath/exec.Command usage with a package-level variable/function like
lookupPath(name string) (string, error) and commandRunner(name string, args
...string) *Cmd) so tests can override them. Change runGetent to use these hooks
(defaulting to exec.LookPath and exec.Command) and update
TestRunGetent_NotAvailable to override lookupPath to return an error, asserting
runGetent("root") fails deterministically. Ensure the hooks are only used in
runGetent and exposed for tests without changing public API semantics.

In `@client/ssh/server/getent_unix.go`:
- Around line 103-106: The function that handles the output of `id -G` currently
returns (nil, nil) when `trimmed == ""`, which is ambiguous; update the branch
that checks `trimmed := strings.TrimSpace(string(out))` so it returns an
explicit empty slice (e.g. `[]string{}`) and nil error instead of `nil,
nil`—locate the function that invokes `id -G`/parses groups (the block using
`trimmed` in getent_unix.go) and change the return to `return []string{}, nil`
to clearly signal “no groups.”

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e961c2cb-cdb7-4dcb-ae71-d9f77c48076d

📥 Commits

Reviewing files that changed from the base of the PR and between e601278 and bc94487.

📒 Files selected for processing (9)
  • client/ssh/server/getent_cgo_unix.go
  • client/ssh/server/getent_nocgo_unix.go
  • client/ssh/server/getent_test.go
  • client/ssh/server/getent_unix.go
  • client/ssh/server/getent_unix_test.go
  • client/ssh/server/getent_windows.go
  • client/ssh/server/shell.go
  • client/ssh/server/user_utils.go
  • client/ssh/server/userswitching_unix.go

Comment thread client/ssh/server/getent_unix_test.go
@lixmal lixmal force-pushed the fix/ssh-nss-user-lookup branch from 40542b7 to 94d8675 Compare March 5, 2026 10:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
client/ssh/server/getent_unix_test.go (1)

237-253: ⚠️ Potential issue | 🟡 Minor

Add id-binary availability guards to runIdGroups tests.

These tests directly execute id -G but don’t skip when id is missing, which makes them infra-dependent on minimal Unix environments.

Proposed fix
 func TestRunIdGroups_CurrentUser(t *testing.T) {
+	if _, err := exec.LookPath("id"); err != nil {
+		t.Skip("id not available on this system")
+	}
+
 	current, err := user.Current()
 	require.NoError(t, err)
@@
 func TestRunIdGroups_NonexistentUser(t *testing.T) {
+	if _, err := exec.LookPath("id"); err != nil {
+		t.Skip("id not available on this system")
+	}
+
 	_, err := runIdGroups("nonexistent_user_xyzzy_12345")
 	assert.Error(t, err)
 }
@@
 func TestIdGroupsMatchStdlib(t *testing.T) {
+	if _, err := exec.LookPath("id"); err != nil {
+		t.Skip("id not available on this system")
+	}
+
 	current, err := user.Current()
 	require.NoError(t, err)

Also applies to: 296-310

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/ssh/server/getent_unix_test.go` around lines 237 - 253, The tests
TestRunIdGroups_CurrentUser and TestRunIdGroups_NonexistentUser call the
external "id -G" binary and must be skipped if that binary is not available;
update both tests (and the similar tests at lines 296-310) to early-check
exec.LookPath("id") and call t.Skipf("skipping test: id binary not found") when
LookPath returns an error so the tests are not infra-dependent; keep the guard
at the top of each test before calling runIdGroups to ensure the skip happens
before any assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/ssh/server/getent_unix_test.go`:
- Around line 237-253: The tests TestRunIdGroups_CurrentUser and
TestRunIdGroups_NonexistentUser call the external "id -G" binary and must be
skipped if that binary is not available; update both tests (and the similar
tests at lines 296-310) to early-check exec.LookPath("id") and call
t.Skipf("skipping test: id binary not found") when LookPath returns an error so
the tests are not infra-dependent; keep the guard at the top of each test before
calling runIdGroups to ensure the skip happens before any assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80a61b48-3534-4dfd-8777-5b7544748412

📥 Commits

Reviewing files that changed from the base of the PR and between bc94487 and 40542b7.

📒 Files selected for processing (9)
  • client/ssh/server/getent_cgo_unix.go
  • client/ssh/server/getent_nocgo_unix.go
  • client/ssh/server/getent_test.go
  • client/ssh/server/getent_unix.go
  • client/ssh/server/getent_unix_test.go
  • client/ssh/server/getent_windows.go
  • client/ssh/server/shell.go
  • client/ssh/server/user_utils.go
  • client/ssh/server/userswitching_unix.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • client/ssh/server/shell.go

@lixmal lixmal force-pushed the fix/ssh-nss-user-lookup branch from 94d8675 to 3234df8 Compare March 5, 2026 11:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/ssh/server/getent_test.go`:
- Around line 163-166: The test in getent_test.go asserts that
getUserShell(current.Uid) returns a path starting with '/', which is stricter
than runtime behavior and can make the test flaky; remove (or replace) the
absolute-path assertion so the test only requires a non-empty shell (keep
require.NotEmpty(t, shell, ...) as the check) and optionally replace the removed
assertion with a t.Logf noting non-absolute shells for debugging; update the
test around getUserShell(current.Uid) accordingly.

In `@client/ssh/server/getent_unix.go`:
- Around line 117-120: The current check that returns nil, nil when `trimmed ==
""` should instead return an error so the no-CGO path's fallback runs; replace
the `return nil, nil` at the `trimmed == ""` branch in getent_unix.go with an
error return (e.g., fmt.Errorf("empty output from id -G for user %s", username))
so that `groupIdsWithFallback` will call `u.GroupIds()` and not silently drop
NSS groups.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82c8c6a4-5b8e-4396-89bd-a7d5435f0506

📥 Commits

Reviewing files that changed from the base of the PR and between 40542b7 and 94d8675.

📒 Files selected for processing (9)
  • client/ssh/server/getent_cgo_unix.go
  • client/ssh/server/getent_nocgo_unix.go
  • client/ssh/server/getent_test.go
  • client/ssh/server/getent_unix.go
  • client/ssh/server/getent_unix_test.go
  • client/ssh/server/getent_windows.go
  • client/ssh/server/shell.go
  • client/ssh/server/user_utils.go
  • client/ssh/server/userswitching_unix.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • client/ssh/server/getent_nocgo_unix.go
  • client/ssh/server/shell.go
  • client/ssh/server/getent_unix_test.go

Comment thread client/ssh/server/getent_test.go
Comment thread client/ssh/server/getent_unix.go
@lixmal lixmal force-pushed the fix/ssh-nss-user-lookup branch from 3234df8 to 9ee4097 Compare March 5, 2026 11:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
client/ssh/server/getent_test.go (1)

71-72: ⚠️ Potential issue | 🟡 Minor

Relax strict absolute-shell assertions to match runtime fallback behavior.

These checks can be flaky because the runtime can fall back to $SHELL, which may be non-absolute in valid environments.

Proposed test adjustment
+import "os/exec"
@@
-	assert.True(t, shell[0] == '/', "shell should be an absolute path, got %q", shell)
+	if shell[0] != '/' {
+		_, err := exec.LookPath(shell)
+		assert.NoError(t, err, "shell should be absolute or resolvable in PATH, got %q", shell)
+	}

Also applies to: 124-125, 164-165

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/ssh/server/getent_test.go` around lines 71 - 72, The test currently
asserts the shell is an absolute path with assert.True(t, shell[0] == '/', ...),
which is flaky because runtime can legitimately fall back to a non-absolute
$SHELL; replace that strict check with a relaxed one using
path/filepath.IsAbs(shell) (or strings.HasPrefix(shell, "/")) OR simply assert
that shell is non-empty and valid for the runtime (e.g. assert.NotEmpty(t,
shell) and skip absolute-path enforcement). Apply the same change to the other
occurrences in this file corresponding to the same assertion pattern (the lines
around the current occurrence and the ones mentioned at 124-125 and 164-165) so
all absolute-path assertions are relaxed consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/ssh/server/getent_test.go`:
- Around line 48-51: The test currently assumes every group ID in the groups
slice can be parsed via strconv.ParseUint, which fails for Windows SID-style IDs
like "S-1-..."; update the loop that iterates over groups (and the other two
identical occurrences) to first check if the gid string is a SID (e.g.,
strings.HasPrefix(gid, "S-")) and skip the numeric parse/assert for those
entries, otherwise call strconv.ParseUint(gid, 10, 32) and assert no error as
before; ensure you import strings if not already present.

---

Duplicate comments:
In `@client/ssh/server/getent_test.go`:
- Around line 71-72: The test currently asserts the shell is an absolute path
with assert.True(t, shell[0] == '/', ...), which is flaky because runtime can
legitimately fall back to a non-absolute $SHELL; replace that strict check with
a relaxed one using path/filepath.IsAbs(shell) (or strings.HasPrefix(shell,
"/")) OR simply assert that shell is non-empty and valid for the runtime (e.g.
assert.NotEmpty(t, shell) and skip absolute-path enforcement). Apply the same
change to the other occurrences in this file corresponding to the same assertion
pattern (the lines around the current occurrence and the ones mentioned at
124-125 and 164-165) so all absolute-path assertions are relaxed consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b6a3e21d-d53f-4a81-a81d-72e2020ce896

📥 Commits

Reviewing files that changed from the base of the PR and between 3234df8 and 9ee4097.

📒 Files selected for processing (9)
  • client/ssh/server/getent_cgo_unix.go
  • client/ssh/server/getent_nocgo_unix.go
  • client/ssh/server/getent_test.go
  • client/ssh/server/getent_unix.go
  • client/ssh/server/getent_unix_test.go
  • client/ssh/server/getent_windows.go
  • client/ssh/server/shell.go
  • client/ssh/server/user_utils.go
  • client/ssh/server/userswitching_unix.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • client/ssh/server/user_utils.go
  • client/ssh/server/getent_nocgo_unix.go
  • client/ssh/server/getent_cgo_unix.go

Comment thread client/ssh/server/getent_test.go Outdated
@lixmal lixmal force-pushed the fix/ssh-nss-user-lookup branch from 9ee4097 to 9c9483b Compare March 5, 2026 12:29
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 5, 2026

@lixmal lixmal merged commit 529c031 into main Mar 13, 2026
40 checks passed
@lixmal lixmal deleted the fix/ssh-nss-user-lookup branch March 13, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't ssh onto a user defined in ldap

2 participants